How to use this manual

The 3D RNA-seq pipeline includes complex steps of expression data generation, data pre-processing, 3D analysis and result summary. This manual provides step-by-step instructions to use the 3D RNA-seq App. To use this App in your work, please cite:

Calixto,C.P.G., Guo,W., James,A.B., Tzioutziou,N.A., Entizne,J.C., Panter,P.E., Knight,H., Nimmo,H., Zhang,R., and Brown,J.W.S. (2018) Rapid and dynamic alternative splicing impacts the Arabidopsis cold response transcriptome. Plant Cell.

Command-line based 3D analysis manual:

Installation and Run

ThreeDRNAseq R package can be installed from Github by using devtools R package

Install ThreeDRNAseq package

##################################################################################################
## use devtools R package to install ThreeDRNAseq from Github
###---> If devtools is not installed, please install
if(!requireNamespace("devtools", quietly = TRUE))
  install.packages('devtools')

###---> Install ThreeDRNAseq
devtools::install_github('wyguo/ThreeDRNAseq')

Install dependency packages

#######################################################################################################
## Install packages of dependency
###---> Install packages from Cran
cran.package.list <- c('shiny','shinydashboard','rhandsontable','shinyFiles','shinyjs','DT',
                       'plotly','ggplot2','eulerr','ggrepel',
                       'gridExtra','fastcluster','Gmisc','rmarkdown')
for(i in cran.package.list){
   if(!(i %in% rownames(installed.packages()))){
     message('Installing package: ',i)
     install.packages(i,dependencies = T)
   } else next
}

###---> Install packages from Bioconductor
bioconductor.package.list <- c('tximport','edgeR','limma','RUVSeq','ComplexHeatmap','rhdf5')
for(i in bioconductor.package.list){
  if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
  if(!(i %in% rownames(installed.packages()))){
    message('Installing package: ',i)
    BiocManager::install("tximport", version = "3.8")
  } else next
}

Run 3D RNA-seq App

Please type the command in your RStudio and the App will be lauched in default web broswer (Google Chrome is recommended for image visualisation issues).

library(ThreeDRNAseq)
ThreeDRNAseq.app()
The main R packages used in the 3D RNA-seq analysis. If any other R packages are missing from your PC, please install accordingly.
Usage Package Link
Make the shiny app shiny https://shiny.rstudio.com/
shinydashboard https://rstudio.github.io/shinydashboard/
rhandsontable https://jrowen.github.io/rhandsontable/
shinyFiles https://cran.r-project.org/web/packages/shinyFiles/index.html
shinyjs https://cran.r-project.org/web/packages/shinyjs/index.html
DT https://cran.r-project.org/web/packages/DT/index.html
Read csv file readr https://cran.r-project.org/web/packages/readr/index.html
Expression generator tximport http://bioconductor.org/packages/release/bioc/html/tximport.html
Batch effect estimation RUVSeq https://bioconductor.org/packages/release/bioc/html/RUVSeq.html
3D analysis limma https://bioconductor.org/packages/release/bioc/html/limma.html
edgeR https://bioconductor.org/packages/release/bioc/html/edgeR.html
Heatmap ComplexHeatmap https://bioconductor.org/packages/release/bioc/html/ComplexHeatmap.html
Cluster of heatmap fastcluster https://cran.r-project.org/web/packages/fastcluster/index.html
Flow charts Gmisc https://cran.r-project.org/web/packages/Gmisc/index.html
Euler plots eulerr https://cran.r-project.org/web/packages/eulerr/index.html
Other plots ggplot2 https://cran.r-project.org/web/packages/ggplot2/index.html
plotly https://cran.r-project.org/web/packages/plotly/index.html
gridExtra https://cran.r-project.org/web/packages/gridExtra/index.html
grid https://cran.r-project.org/web/packages/grid/index.html
Generate report rmarkdown https://cran.r-project.org/web/packages/rmarkdown/index.html

Example data

Example data can be downloaded from: xxx

The example RNA-seq data is a subset from study of Arabidopsis in response to cold (Calixto et al., 2018). Datasets of three time-points (red circles) were extracted from the whole study:

Time-points Description
T2 Second time-point of Day 0, temperature \(20^oC\)
T10 Second time-point of Day 1, temperature \(4^oC\) of cold transition
T19 Second time-point of Day 4, temperature \(4^oC\) of cold acclimation

Each time-point has 3 biological replicates and each biological replicate has 3 sequencing replicates. Transcript quantification was generated by using Salmon (Patro et al., 2017) and AtRTD2-QUASI (Zhang et al., 2017) as transcriptome reference. Expression comparisons were designed as “T10 vs T2” and “T19-T2” to identify the cold response genes and transcripts at both transcription and AS level.

Data download links:

Prepare input data

The following datasets are required in the pipeline (Figure 1):

  • Gene-transcript mapping information in a csv file “mapping.csv” (Figure 1A). In the spreadsheet, the first column is the transcript list with column name “TXNAME” and the second column is the gene list with column name “GENEID”.
  • Sample information in a csv file “samples.csv” (Figure 1B). The spreadsheet must include columns (highlighted in read colour):
    • brep: biological replicates are labelled as “brep1”, “brep2”, “brep3”,… to distinguish the samples.
    • srep: if there are sequencing replicates (the same biological replicate is sequenced multiple times) in the dataset, they must labelled as “srep1”, ‘srep2’, ‘srep3’,…, otherwise, labelled as “srep1”,“srep1”, “srep1”,… for all the samples.
    • condition: conditions of the samples.
    • path: the complete path of transcript quantification, e.g. the “quant.sf” files of Salmon (Patro et al., 2017) and “abundance.h5” files of Kallisto (Bray et al., 2016)
  • The transcript quantification outputs. The folder names of the quantification outputs must match to the path information in the “samples.csv” files. For example, Figure 1C shows the Salmon (Patro et al., 2017) quantification output files.
  • Contrast groups for expression comparison in a csv file “contrast.csv” (Figure 1D). In the spreadsheet, the first column is the treatment conditions with column name “Treatment” and the second column is the control conditions with column name “Control”.

Note: the analysis will read information according to the column names in the spreadsheets. Please use the same column names as in examples.

Figure 1: User provided datasets.


Output data

Four folders will be created in the working directory to save the analysis results:

  • data folder: the intermediate datasets in the analysis will be saved in the “data” folder in R data format (.RData). Advanced R users can open and process these objects in R software. For basic R users, please keep in mind these “.RData” objects are used to store the intermediate datasets.
  • result folder: the DE, DAS and DTU results will be saved as csv files in the “result” folder.
  • figure folder: the figures/plots will be saved in the “figure” folder.
  • report folder: the final report in html, pdf and word formats generated from the App will be saved in the “report” folder.

The detailed descriptions of files saved in the above folders can be found in Section “Saved files in local directory”.

3D RNA-seq App

The App includes 7 pages in the sidebar menu: Introduction, Data generation, Data pre-processing, DE DAS and DTU analysis, Result summary, Advanced plot and Generate report.


Data generation

Set directory and load datasets

To begin with, users can choose a working directory to save the output data, results and figures.

Figure 2: Choose working directory. Please click the “triangles” in the left-side “Directories” box to access and select sub-folders. The click and select in the right-side “Content” box is invalid.

The whole pipeline has a number of steps and each step is related to several intermediate datasets (see Section “Output data”). This table summarise the generation and usage steps of these datasets and whether (True or False) they have been loaded in workspace ready for use. To avoid running the pipeline from very beginning every time the App get refreshed, the following options are provided:

  • If all/part of the required datasets (in .RData format) do not exist in the “data” folder of working directory (see Section: “Saved files in local directory”), users can generate these datasets from corresponding steps.
  • If all/part of the required datasets (in .RData format) already exist in the “data” folder, users can click “Load all” button to load them.
  • To perform analysis or visualise results in a specific step, users also can load the required datasets for individual steps.

Figure 3: Data information for the analysis. The columns:

  • Data – the .RData file names in the “data” folder.
  • In data folder – logical, whether the required datasets exist in the “data” folder.
  • In workspace – logical, whether the required datasets have been uploaded for analysis.
  • Description – description of the datasets.
  • Generate from – the steps to generate the required datasets.
  • Use in – the steps to use the datasets.

Generate gene and transcript expression

The csv files “mapping.csv” and “samples.csv” (see Figure 1A and B) are loaded (Figure 4A) and visualised (Figure 4C) in this step. The tximport R package (Soneson et al., 2016) is used to generate gene and transcript expression of read counts and TPMs (transcript per million reads) based on three options: “no” (default), “scaledTPM” and “lengthScaledTPM”. The “lengthScaledTPM” is recommended since it scales the expression using the transcript length and then the library size over samples (see the tximport documentation for details). The generated gene and transcript expression will be save as R data object txi_genes.RData and txi_trans.RData in the “data” folder, respectively. Spreadsheets of read counts and TPMs will be saved as csv files in the “result” folder.

Figure 4: Generate gene and transcript expression.


Data pre-processing

Once the read counts and TPMs are generated, the data is pre-processed with steps: Merge sequencing replicates (if exist), Remove low expressed transcripts/genes, Estimate batch effects (if exist) and Data normalisation. In each step, quality control plots are generated to optimise the parameters for pre-processing (Figure 5).

Figure 5: Data pre-processing. The figures were generated based on the RNA-seq data from study of Arabidopsis in response to cold (Calixto et al., 2018).


Merge sequencing replicates

The sequencing replicates are generated by sequencing the same biological replicate multiple times, which do not add too much information to the technical variation. Sequencing replicates from the same biological replicate are often added to increase the depth. In the panel of Figure 6A, the sequencing replicate information will be extracted from the “srep” column of the “samples.csv” file ( Figure 1B) to indicate whether to merge the sequencing replicates.

Filter low expression

The low expressed transcripts are filtered based on count per million (CPM) reads: \[ CPM=\frac{X_i}{N}\times 10^6 \] where \(X_i\) is read count of transcript \(i\) and \(N\) is the library size of the sample.

  • An expressed transcript must have \(\geq n\) samples with expression \(\geq m\) CPM.
  • An expressed gene must have at least one expressed transcript.

The sample number cut-off \(n\) and CPM cut-off \(m\) are determined by the mean-variance trend plot (Figure 6B and Figure 6C). Read counts are assumed to be negative binomial distributed of which the mean and variance has a relation: \[ Var(\log_2X)=\frac{1}{\mu}+\phi \] where \(X\) is read count, \(\mu\) is the mean and \(\phi\) is the overdispersion. The expression variance decreasing monotonically with the increasing of the mean. In real RNA-seq data, the low expressed transcripts confound with noise and the expression distribution is different to the expressed transcripts. In the mean-variance trend plot, the low expressed transcripts cause a drop trend in low expression region (Figure 5 and Figure 6C). Therefore, the cut-offs \(n\) and \(m\) to filter the low expression can be optimised until the drop trend in low expression region disappeared in the mean-variance trend plot.

Figure 6: Merge sequencing replicates and filter low expressed genes/transcripts. (D) is the scatter plot of mean-variance trends before and after filtering the low expressed transcripts. The fitted lines of the trends are also presented (Yellow line: before filtering; Red line: after filtering).


Principal component analysis (PCA) and batch effect estimation

PCA is a mathematical method to reduce expression data dimensionality while remaining most of the data variance. The reduction is performed by projecting the data to directions or principal components (PCs) from highest to lowest data variability. Therefore, the data main variance is accessible by investigate top few number of PCs rather than thousands of variables. In Omic data analysis, the first two to four principal components are often used to visualise the similarities and differences of samples, thereby determining the grouping of samples. In the PCA scatter plot, the samples from the same condition often stay close, but far from the samples of other conditions. It can be used as evidence for data quality checking.

PCA plot also can be used to identify whether the RNA-seq data contain batch effects, which are caused by biological replications in different laboratory conditions. Compared with random noise, batch effects can be distinguished owning to the systematic biases of signals across replicates. For example, Figure 7A shows the PCA plot (PC1 vs PC2) of a RNA-seq data with 3 time-points T2, T10 and T19. Each time-point has 3 biological replicates. It can be seen that the first biological replicate (brep1) stays in a separate cluster to other two replicates, which indicates a clear batch effect of the data. In this pipeline, the RUVSeq R package (Risso et al., 2014) is used to estimate the batch effects. The RUVSeq algorithm generates batch effect terms which can be incorporated in the design matrix of linear regression model for 3D analysis, i.e.

\[ \text{observed expression = baseline effects + batch effects + noise} \]

It also generates a pseudo read count matrix in which the batch effects have been removed from the data. To avoid data over-modification, this matrix is only used to make PCA plot, but not used for downstream 3D analysis (Figure 7B).

In this panel (Figure 7), users can select and visualise different PCs based on transcript level or gene level expression, or the average expression of biological replicates. The scatter points can be grouped and coloured according to biological replicates or conditions. Ellipses or polygons can be added to the plots to highlight the grouped clusters. Three options, RUVr, RUVs and RUVg, in the RUVSeq R package are provided to estimate the batch effects (see the package documentation for details). Users can “Update PCA plot” to visualise whether the batch effects are removed properly. If there are no clear batch effects in the PCA plot, please skip this step.

Figure 7: PCA plot before (A) and after (B) remove batch effects.


Data normalisation

To enable unbiased comparisons across samples, read counts must be normalised on the basis of sequencing depths. Normalisation methods Trimmed Mean of M-values (TMM), Relative Log Expression (RLE) and upper-quartile can be used to reduce the effects from the systematic technical biases across samples (Bullard et al., 2010). Box plots are used to visualise the expression distribution of raw read counts and normalised expression across samples.

Figure 8: Data normalisation.

3D RNA-seq analysis

Definition of 3D in this pipeline


Statistics

  • Contrast groups: refer to the conditions to compare, e.g. “B-A” and “C-A” compares the conditions “B” and “C” to condition “A”.
  • Adjusted p-value: in the expression comparative analysis, each gene/transcript is fitted with a statistical model, reporting a p-value. However, the testing is performed over a substantial number of genes/transcripts and each testing has a probability of making errors. Hence, all p-values in the analysis are adjusted to control the false discovery rate (FDR). Please see the R function p.adjust for details.
  • \(L_2FC\): \(\log_2\) fold change of expression based on contrast groups.
  • \(\Delta PS\): Transcript level PS (percent of splice) is defined as the ratio of transcript average abundance of conditions divided by the average gene abundance. \(\Delta PS\) is the PS differences of conditions based on the contrast groups. We recommend to use TPMs to calculate PS values. The option to generate PS from read counts is also provided (Figure 9C).

DE gene/transcript analysis

A gene/transcript is identified as DE in a contrast group if \(L_2FC\) of expression \(\geq\) cut-off and with adjusted p-value < cut-off.

Two pipelines are provided for DE analysis: “limma-voom” and “edgeR”. The “edgeR” pipeline includes two methods: “glmQL” (Genewise Negative Binomial Generalized Linear Models with Quasi-likelihood Tests) and “glm” (Genewise Negative Binomial Generalized Linear Models). In limma pipeline, \(L_2FCs\) are calculated based count per million reads (CPMs) while in edgeR, \(L_2FCs\) are based on read counts. From several real RNA-seq data analyses, high consensus is achieved between “limma-voom” and “glmQL” (>90%) and they have more stringent controls of false discovery rate than the “glm” method.

DAS gene and DTU transcript analysis

To test differential alternative splicing (AS), the gene level expression is compared to transcript level expression in the contrast groups. A gene is DAS in a contrast group if adjusted p-value < cut-off and at least one transcript of the gene with \(\Delta PS \geq\) cut-off. A transcript is DTU if adjusted p-value < cut-off and \(\Delta PS \geq\) cut-off.

Two pipelines for AS analysis:

  • limma pipeline: To identify DAS genes, the expression of each transcript is compared to the weighted average expression of all the transcripts for the same gene (weight on transcript expression variance; the average expression can be treated as gene level expression). P-value of each test is converted to genewise p-value by using F-test or Simes method. To identify DTU transcript, each transcript is compared to the weighted average of all the other transcripts for the same gene. See the diffSplice function in limma package (Ritchie et al., 2015) for details.
  • edgeR pipeline: To identify DTU transcripts, log-fold-change of each transcript is compared to log-fold-change of entire gene (sum of all transcripts). The DTU transcript test p-values are summarised to genewise p-value by using F-test or Simes method to report DAS genes. See diffSpliceDGE function in “edgeR” R package (Robinson et al., 2011) for details.

To perform the analysis, the contrast groups in csv file (Figure 1D) can be loaded in panel Figure 9A. 3D analysis pipeline, \(\Delta PS\) and cut-offs of test statistics can be selected/generated in panels Figure 9B and C. Subset of \(\Delta PS\) can be visualised in panel Figure 9D.

Figure 9: 3D RNA-seq analysis panels.


Result summary

This page includes panels to show the 3D analysis results (Figure 10-12):

  • The test statistics in different contrast groups, e.g. adjusted p-value and \(L_2FC\).
  • The number of genes and transcripts with significant expression changes in contrast groups. The numbers of DE only genes, DAS only genes, both DE and DAS genes, DE only transcripts, DTU only transcripts and both DE and DTU transcripts. The numbers and their comparisons are present as Euler diagrams and flow charts.
  • Up- and down-regulation gene and transcript numbers are shown as bar plots.
  • All the tables and plots can be saved to local folder in the working directory (see Section: “Saved files in local directory”).

Figure 10: Number of DE genes, DAS genes, DE transcripts and DTU transcripts, and bar plots of up-down regulation numbers.

Figure 11: Euler diagram of numbers between contrast groups (A) and numbers to compare DE vs DAS genes and DT vs DTU transcripts (B).

Figure 12: Flow charts to show numbers of DE and/or DAS genes and DE and/or DTU transcripts.


Advanced plot

Heatmap

Users can make heatmap for DE genes, DAS genes, DE transcripts and DTU transcripts identified from the analysis (Figure 13A). Users can also upload a gene or transcript list in csv file to make heatmap for specific list (Figure 13B). To make the heatmap, the average TPMs of conditions for the targets are standardized into z-scores. In corresponding plots, the DAS only genes or DTU only transcripts are excluded since they only have AS changes rather than significant abundance changes. The dist and hclust R functions are used to perform the hierarchical clustering analysis. The heatmaps and the target list in each cluster of the heatmaps can be saved to local folder.

Figure 13: Heatmap panel (A) and example of user provided gene list in csv file to make heatmap (B). To avoid confliction, if the radio button “Select target” is selected, the “Or Choose target list csv file” function is enabled; If the radio button “Upload target list” is selected, the “Select targets” function is enabled.


Profile plot

Gene and transcript expression profiles (TPMs or read counts) and PS (percent spliced) can be visualised by typing a gene name (Figure 14A). Users can also generate a number of plots by provides a gene list (Figure 13B) and all the plots will be saved to “Multiple profile plot” folder in the “figure” folder of working directory (Figure 14B).

Figure 14: Plot of expression profiles and PS across conditions


GO annotation plot

Users can generate DE and DAS gene list by click “Generate” button (Figure 15B). These gene lists can be loaded to Gene Ontology (GO) analysis tools/databases (e.g. DAVID and agriGO) to generate GO annotation. A csv file with GO annotation information is required to generate the annotation plot (Figure 15B). The file includes a “Category” column of CC (cellular component), BP (biological process) and MF (molecular function), a column of “Term” of GO annotation and columns of statistics to report the annotation enrichment, e.g. count, FDR, -log10(FDR), etc. (Figure 15A). In the panel, users can select different statistics to visualise. The selected gene list type will present in the plot title to distinguish whether the provided gene list is DE or DAS genes.

Figure 15: GO annotation plot panel.


Generate report

The table in this page lists the parameters for 3D analysis and the locations of saved files, which will be used to generate the report. If the information is wrong, users can double click the cells in the table to correct to values. Based on this information, reports in three formats, word, pdf and html, will be generated and saved in the “report” folder of working directory.

Figure 16: Generate report panel.


Saved files in local directory

Saved files in the “data” folder

The intermediate datasets in “.RData” format are saved in the “data” folder. Advanced R users can access these “.RData” objects by R command line. For the returned objects from R functions/packages, please go to corresponding documentations for details.
File names Description
contrast.RData Contrast group
DAS_genes.RData Testing statistics of DAS genes.
DE_genes.RData Testing statistics of DE genes.
DE_trans.RData Testing statistics of DE transcripts.
deltaPS.RData Matrix of deltaPS values.
DTU_trans.RData Testing statistics of DTU transcripts.
genes_3D_stat.RData The returned object of DE gene analysis by using limma.pipeline/edgeR.pipeline functions in ThreeDRNAseq R package.
genes_batch.RData The returned object of gene level batch effect estimation by using RUVSeq R package.
genes_dge.RData The returned object of gene expression normalization by using calcNormFactors function in edgeR.
mapping.RData Matrix of transcript-gene mapping.
PS.RData Matrix of transcript percent spliced (PS), which is used to generate deltaPS based on contrast groups.
samples.RData The sample information before merging sequencing replicates.
samples_new.RData The sample information after merging sequencing replicates. If there are no sequencing replicates, the “sample_new” object has the same number of rows with the “samples” object.
target_high.RData A list object with two element: expressed transcripts (trans_high) and expressed genes (genes_high).
trans_3D_stat.RData The returned object of DAS gene, DE and DTU transcripts analysis by using limma.pipeline/edgeR.pipeline functions in ThreeDRNAseq R package.
trans_batch.RData The returned object of transcript level batch effect estimation by using RUVSeq R package..
trans_dge.RData The returned object of transcript expression normalization by using calcNormFactors function in edgeR.
txi_genes.RData The returned object of gene expression by using tximport R package.
txi_trans.RData The returned object of transcript expression by using tximport R package.

Saved files in the “result” folder

The csv files in the result folder of working directory. Raw read counts/TPMs are the datasets before any data pre-processing, e.g. sequencing replicate merge and low expression filters.
File.names Description
counts_genes.csv Gene level raw read counts
counts_trans.csv Transcript level raw read counts
DAS genes.csv Testing statistics of DAS genes.
data.info.csv Data information during data pre-processing.
DE DAS DTU numbers.csv Numbers of 3D genes/transcripts in contrast groups.
DE genes.csv Testing statistics of DE genes.
DE transcripts.csv Testing statistics of DE transcripts.
DE vs DAS gene number.csv Numbers of DE vs DAS genes in contrast groups.
DE vs DTU transcript number.csv Numbers of DE vs DTU transcripts in contrast groups.
DTU transcripts.csv Testing statistics of DTU transcripts.
Parameter summary.csv Methods/Parameters/Cut-offs used for 3D RNA-seq analysis.
Target in each cluster heatmap * DE&DAS genes.csv DE&DAS gene lists in clusters of DAS gene heatmap. The DASonly genes are excluded since they have no significant abundance changes across samples.
Target in each cluster heatmap * DE genes.csv DE gene list in clusters of DE gene heatmap.
Target in each cluster heatmap * DE trans.csv DE&DTU transcript lists in clusters of DTU transcript heatmap. The DTUonly transcripts are excluded since they have no significant abundance changes across samples.
Target in each cluster heatmap * DE&DTU trans.csv DE transcript list in clusters of DE transcript heatmap.
TPM_genes.csv Gene level raw TPMs
TPM_trans.csv Transcript level raw TPMs

Saved files in the “figure” folder

Figures are saved to “figure” folder in both “png” and “pdf” formats.
File.names Description
DAS genes GO annotation plot.png/.pdf DAS genes GO annotation plot
DAS genes updown regulation numbers.png/.pdf DAS genes updown regulation numbers
DE genes GO annotation plot.png/.pdf DE genes GO annotation plot
DE genes updown regulation numbers.png/.pdf DE genes updown regulation numbers
DE transcripts updown regulation numbers.png/.pdf DE transcripts updown regulation numbers
DTU transcripts updown regulation numbers.png/.pdf DTU transcripts updown regulation numbers
Gene data distribution.png/.pdf Gene data distribution
Gene mean-variance trend.png/.pdf Gene mean-variance trend
Gene PCA Average expression.png/.pdf Gene PCA Average expression
Gene PCA batch effect removed Bio-reps.png/.pdf Gene PCA batch effect removed Bio-reps
Gene PCA Bio-reps.png/.pdf Gene PCA Bio-reps
Heatmap DAS genes.png/.pdf Heatmap DAS genes
Heatmap DE genes.png/.pdf Heatmap DE genes
Heatmap DE transcripts.png/.pdf Heatmap DE transcripts
Heatmap DTU transcripts.png/.pdf Heatmap DTU transcripts
Transcript data distribution.png/.pdf Transcript data distribution
Transcript mean-variance trend.png/.pdf Transcript mean-variance trend
Transcript PCA Average expression.png/.pdf Transcript PCA Average expression
Transcript PCA batch effect removed Bio-reps.png/.pdf Transcript PCA batch effect removed Bio-reps
Transcript PCA Bio-reps.png/.pdf Transcript PCA Bio-reps
Union set DE genes vs DAS genes.png/.pdf Flow chart -Union set DE genes vs DAS genes
Union set DE transcripts vs DTU transcripts.png/.pdf Flow chart -Union set DE transcripts vs DTU transcripts

Saved files in the “report” folder

Reports in three formats, html, pdf and word, are generated from 3D RNA-seq analysis results/figures.
File.names Description
report.docx word format
report.html html format
report.pdf pdf format

References

Bray,N.L., Pimentel,H., Melsted,P., and Pachter,L. (2016) Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol., 34, 525–527.

Bullard,J.H., Purdom,E., Hansen,K.D., and Dudoit,S. (2010) Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics, 11, 94.

Calixto,C.P.G., Guo,W., James,A.B., Tzioutziou,N.A., Entizne,J.C., Panter,P.E., Knight,H., Nimmo,H.G., Zhang,R., and Brown,J.W.S. (2018) Rapid and dynamic alternative splicing impacts the Arabidopsis cold response transcriptome. bioRxiv, 251876.

Law,C.W., Chen,Y., Shi,W., and Smyth,G.K. (2014) voom: Precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol, 15, R29.

Patro,R., Duggal,G., Love,M.I., Irizarry,R.A., and Kingsford,C. (2017) Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods, 14, 417–419.

Risso,D., Ngai,J., Speed,T.P., and Dudoit,S. (2014) Normalization of RNA-seq data using factor analysis of control genes or samples. Nat. Biotechnol., 32, 896–902.

Ritchie,M.E., Phipson,B., Wu,D., Hu,Y., Law,C.W., Shi,W., and Smyth,G.K. (2015) limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res, 43, e47.

Robinson,M., Mccarthy,D., Chen,Y., and Smyth,G.K. (2011) edgeR : differential expression analysis of digital gene expression data User ’ s Guide. Most, 23, 1–77.

Soneson,C., Love,M.I., and Robinson,M.D. (2016) Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Research, 4, 1521.

Session information

## R version 3.5.1 (2018-07-02)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 7 x64 (build 7601) Service Pack 1
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United Kingdom.1252 
## [2] LC_CTYPE=English_United Kingdom.1252   
## [3] LC_MONETARY=English_United Kingdom.1252
## [4] LC_NUMERIC=C                           
## [5] LC_TIME=English_United Kingdom.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] compiler_3.5.1  backports_1.1.2 magrittr_1.5    rprojroot_1.3-2
##  [5] tools_3.5.1     htmltools_0.3.6 yaml_2.2.0      Rcpp_0.12.18   
##  [9] stringi_1.1.7   rmarkdown_1.10  knitr_1.20      stringr_1.3.1  
## [13] digest_0.6.17   evaluate_0.11